Online checkpointing with improved worst-case guarantees

نویسندگان

Karl Bringmann

Benjamin Doerr

Adrian Neumann

Jakub Sliacan

چکیده

In the online checkpointing problem, the task is to continuously maintain a set of k checkpoints that allow to rewind an ongoing computation faster than by a full restart. The only operation allowed is to replace an old checkpoint by the current state. Our aim are checkpoint placement strategies that minimize rewinding cost, i.e., such that at all times T when requested to rewind to some time t ≤ T the number of computation steps that need to be redone to get to t from a checkpoint before t is as small as possible. In particular, we want that the closest checkpoint earlier than t is not further away from t than qk times the ideal distance T/(k + 1), where qk is a small constant. Improving over earlier work showing 1 + 1/k ≤ qk ≤ 2, we show that qk can be chosen asymptotically less than 2. We present algorithms with asymptotic discrepancy qk ≤ 1.59 + o(1) valid for all k and qk ≤ ln(4) + o(1) ≤ 1.39 + o(1) valid for k being a power of two. Experiments indicate the uniform bound pk ≤ 1.7 for all k. For small k, we show how to use a linear programming approach to compute good checkpointing algorithms. This gives discrepancies of less than 1.55 for all k < 60. We prove the first lower bound that is asymptotically more than one, namely qk ≥ 1.30− o(1). We also show that optimal algorithms (yielding the infimum discrepancy) exist for all k.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cooperative Checkpointing for Supercomputing Systems

A system-level checkpointing mechanism, with global knowledge of the state and health of the machine, can improve performance and reliability by dynamically deciding when to skip checkpoint requests made by applications. This thesis presents such a technique, called cooperative checkpointing, and models its behavior as an online algorithm. Where C is the checkpoint overhead and I is the request...

متن کامل

Online and Random-order Load Balancing Simultaneously

We consider the problem of online load balancing under lp-norms: sequential jobs need to be assigned to one of the machines and the goal is to minimize the lp-norm of the machine loads. This generalizes the classical problem of scheduling for makespan minimization (case l∞) and has been thoroughly studied. However, despite the recent push for beyond worst-case analyses, no such results are know...

متن کامل

Designing smoothing functions for improved worst-case competitive ratio in online optimization

Online optimization covers problems such as online resource allocation, online bipartite matching, adwords (a central problem in e-commerce and advertising), and adwords with separable concave returns. We analyze the worst case competitive ratio of two primal-dual algorithms for a class of online convex (conic) optimization problems that contains the previous examples as special cases defined o...

متن کامل

Exploiting easy data in online optimization

We consider the problem of online optimization, where a learner chooses a decision from a given decision set and suffers some loss associated with the decision and the state of the environment. The learner’s objective is to minimize its cumulative regret against the best fixed decision in hindsight. Over the past few decades numerous variants have been considered, with many algorithms designed ...

متن کامل

The Best of Both Worlds: Stochastic and Adversarial Bandits

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Online checkpointing with improved worst-case guarantees

نویسندگان

چکیده

منابع مشابه

Cooperative Checkpointing for Supercomputing Systems

Online and Random-order Load Balancing Simultaneously

Designing smoothing functions for improved worst-case competitive ratio in online optimization

Exploiting easy data in online optimization

The Best of Both Worlds: Stochastic and Adversarial Bandits

عنوان ژورنال:

اشتراک گذاری